15 research outputs found
A Concurrency-Optimal Binary Search Tree
The paper presents the first \emph{concurrency-optimal} implementation of a
binary search tree (BST). The implementation, based on a standard sequential
implementation of an internal tree, ensures that every \emph{schedule} is
accepted, i.e., interleaving of steps of the sequential code, unless
linearizability is violated. To ensure this property, we use a novel read-write
locking scheme that protects tree \emph{edges} in addition to nodes.
Our implementation outperforms the state-of-the art BSTs on most basic
workloads, which suggests that optimizing the set of accepted schedules of the
sequential code can be an adequate design principle for efficient concurrent
data structures
Parallel Combining: Benefits of Explicit Synchronization
A parallel batched data structure is designed to process synchronized batches of operations on the data structure using a parallel program. In this paper, we propose parallel combining, a technique that implements a concurrent data structure from a parallel batched one. The idea is that we explicitly synchronize concurrent operations into batches: one of the processes becomes a combiner which collects concurrent requests and initiates a parallel batched algorithm involving the owners (clients) of the collected requests. Intuitively, the cost of synchronizing the concurrent calls can be compensated by running the parallel batched algorithm.
We validate the intuition via two applications. First, we use parallel combining to design a concurrent data structure optimized for read-dominated workloads, taking a dynamic graph data structure as an example. Second, we use a novel parallel batched priority queue to build a concurrent one. In both cases, we obtain performance gains with respect to the state-of-the-art algorithms
The splay-list: A distribution-adaptive concurrent skip-list
The design and implementation of efficient concurrent data structures have
seen significant attention. However, most of this work has focused on
concurrent data structures providing good \emph{worst-case} guarantees. In real
workloads, objects are often accessed at different rates, since access
distributions may be non-uniform. Efficient distribution-adaptive data
structures are known in the sequential case, e.g. the splay-trees; however,
they often are hard to translate efficiently in the concurrent case.
In this paper, we investigate distribution-adaptive concurrent data
structures and propose a new design called the splay-list. At a high level, the
splay-list is similar to a standard skip-list, with the key distinction that
the height of each element adapts dynamically to its access rate: popular
elements ``move up,'' whereas rarely-accessed elements decrease in height. We
show that the splay-list provides order-optimal amortized complexity bounds for
a subset of operations while being amenable to efficient concurrent
implementation. Experimental results show that the splay-list can leverage
distribution-adaptivity to improve on the performance of classic concurrent
designs, and can outperform the only previously-known distribution-adaptive
design in certain settings
Π¦ΠΈΡΠΎΠΊΠΈΠ½ΠΎΠ²ΡΠΉ ΠΏΡΠΎΡΠΈΠ»Ρ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ² Ρ ΡΠΎΡΠ΅ΡΠ°Π½Π½ΠΎΠΉ ΠΊΠ°ΡΠ΄ΠΈΠΎ- ΠΈ ΠΎΡΡΠ°Π»ΡΠΌΠΎΠΏΠ°ΡΠΎΠ»ΠΎΠ³ΠΈΠ΅ΠΉ
Π‘ΠΎΡΠ΅ΡΠ°Π½Π½Π°Ρ ΠΊΠ°ΡΠ΄ΠΈΠΎΠ»ΠΎΠ³ΠΈΡΠ΅ΡΠΊΠ°Ρ ΠΈ ΠΎΡΡΠ°Π»ΡΠΌΠΎΠ»ΠΎΠ³ΠΈΡΠ΅ΡΠΊΠ°Ρ ΠΏΠ°ΡΠΎΠ»ΠΎΠ³ΠΈΡ ΠΈΠΌΠ΅Π΅Ρ Π²ΡΡΠΎΠΊΡΡ ΡΠ°ΡΠΏΡΠΎΡΡΡΠ°Π½ΡΠ½Π½ΠΎΡΡΡ Π² ΡΡΠ°ΡΡΠΈΡ
Π²ΠΎΠ·ΡΠ°ΡΡΠ½ΡΡ
Π³ΡΡΠΏΠΏΠ°Ρ
Π½Π°ΡΠ΅Π»Π΅Π½ΠΈΡ ΠΈ ΠΎΠ±ΡΠΈΠ΅ ΠΏΠ°ΡΠΎΠ³Π΅Π½Π΅ΡΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΠΌΠ΅Ρ
Π°Π½ΠΈΠ·ΠΌΡ, ΠΊ ΡΠΈΡΠ»Ρ ΠΊΠΎΡΠΎΡΡΡ
, Π±Π΅Π·ΡΡΠ»ΠΎΠ²Π½ΠΎ, ΠΎΡΠ½ΠΎΡΠΈΡΡΡ Π½Π°ΡΡΡΠ΅Π½ΠΈΠ΅ ΡΠΈΡΠΎΠΊΠΈΠ½ΠΎΠ²ΠΎΠ³ΠΎ ΠΏΡΠΎΡΠΈΠ»Ρ. ΠΠ΄Π½Π°ΠΊΠΎ ΡΠΈΡΠΎΠΊΠΈΠ½ΠΎΠ²ΡΠΉ ΠΏΡΠΎΡΠΈΠ»Ρ ΠΊΡΠΎΠ²ΠΈ ΠΏΡΠ°ΠΊΡΠΈΡΠ΅ΡΠΊΠΈ Π½Π΅ Π°Π½Π°Π»ΠΈΠ·ΠΈΡΠΎΠ²Π°Π»ΡΡ Ρ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ² ΠΏΠΎΠΆΠΈΠ»ΠΎΠ³ΠΎ Π²ΠΎΠ·ΡΠ°ΡΡΠ° Ρ ΡΠΎΡΠ΅ΡΠ°Π½Π½ΠΎΠΉ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΡΡ ΡΠ΅ΡΠ΄ΡΠ° Ρ Π³Π»Π°ΡΠΊΠΎΠΌΠΎΠΉ. Π¦Π΅Π»Ρ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ β ΠΈΠ·ΡΡΠ΅Π½ΠΈΠ΅ ΡΠΈΡΠΎΠΊΠΈΠ½ΠΎΠ²ΠΎΠ³ΠΎ ΠΏΡΠΎΡΠΈΠ»Ρ Ρ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ² Ρ ΡΠΎΡΠ΅ΡΠ°Π½Π½ΠΎΠΉ ΠΊΠ°ΡΠ΄ΠΈΠΎ- ΠΈ ΠΎΡΡΠ°Π»ΡΠΌΠΎΠΏΠ°ΡΠΎΠ»ΠΎΠ³ΠΈΠ΅ΠΉ. ΠΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠ΅ Π²ΡΠΏΠΎΠ»Π½Π΅Π½ΠΎ Π² Π’Π°ΠΌΠ±ΠΎΠ²ΡΠΊΠΎΠΌ ΡΠΈΠ»ΠΈΠ°Π»Π΅ ΠΠΠ’Π Β«ΠΠΈΠΊΡΠΎΡ
ΠΈΡΡΡΠ³ΠΈΡ Π³Π»Π°Π·Π° ΠΈΠΌΠ΅Π½ΠΈ Π°ΠΊΠ°Π΄Π΅ΠΌΠΈΠΊΠ° Π‘.Π. Π€Π΅Π΄ΠΎΡΠΎΠ²Π°Β» Π² Π΄Π²ΡΡ
Π³ΡΡΠΏΠΏΠ°Ρ
: ΠΏΠ°ΡΠΈΠ΅Π½ΡΡ Ρ ΡΠΎΡΠ΅ΡΠ°Π½Π½ΠΎΠΉ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΡΡ ΡΠ΅ΡΠ΄ΡΠ° Ρ Π³Π»Π°ΡΠΊΠΎΠΌΠΎΠΉ (n=58 ΡΠ΅Π»ΠΎΠ²Π΅ΠΊ) ΠΈ ΠΏΠ°ΡΠΈΠ΅Π½ΡΡ Ρ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΡΡ ΡΠ΅ΡΠ΄ΡΠ° (n=49 ΡΠ΅Π»ΠΎΠ²Π΅ΠΊ), ΠΈΠΌΠ΅ΡΡΠΈΡ
Π² ΠΎΠ±ΠΎΠΈΡ
ΡΠ»ΡΡΠ°ΡΡ
ΠΎΠ΄ΠΈΠ½Π°ΠΊΠΎΠ²ΡΠΉ Π²ΠΎΠ·ΡΠ°ΡΡ 60-74 Π»Π΅Ρ. ΠΠΈΠ°Π³Π½ΠΎΡΡΠΈΠΊΠ° Π³Π»Π°ΡΠΊΠΎΠΌΡ ΠΏΡΠΎΠ²Π΅Π΄Π΅Π½Π° Π² ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²ΠΈΠΈ Ρ ΠΊΡΠΈΡΠ΅ΡΠΈΡΠΌΠΈ Β«ΠΠ°ΡΠΈΠΎΠ½Π°Π»ΡΠ½ΠΎΠ³ΠΎ ΡΡΠΊΠΎΠ²ΠΎΠ΄ΡΡΠ²Π° ΠΏΠΎ Π³Π»Π°ΡΠΊΠΎΠΌΠ΅Β». ΠΠ»Ρ Π΄ΠΈΠ°Π³Π½ΠΎΡΡΠΈΠΊΠΈ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΠΈ ΡΠ΅ΡΠ΄ΡΠ° Π²ΡΠΏΠΎΠ»Π½ΡΠ»ΠΈΡΡ ΡΠ»Π΅ΠΊΡΡΠΎΠΊΠ°ΡΠ΄ΠΈΠΎΠ³ΡΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΠ΅, ΡΡ
ΠΎΠΊΠ°ΡΠ΄ΠΈΠΎΠ³ΡΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΠ΅, ΡΠ΅Π½ΡΠ³Π΅Π½ΠΎΠ³ΡΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΠ΅, ΡΠ½Π·ΠΈΠΌΠ½ΡΠ΅ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΡ. ΠΠΏΡΠ΅Π΄Π΅Π»Π΅Π½ΠΈΠ΅ ΡΠΈΡΠΎΠΊΠΈΠ½ΠΎΠ² Π² ΠΏΠ»Π°Π·ΠΌΠ΅ ΠΊΡΠΎΠ²ΠΈ ΠΏΡΠΎΠ²ΠΎΠ΄ΠΈΠ»ΠΎΡΡ Π½Π° Π°ΠΏΠΏΠ°ΡΠ°ΡΠ΅ Β«Beckton Dickinson FACS Canto 2 (USA)Β» Ρ ΠΏΠΎΠΌΠΎΡΡΡ ΡΠΏΠ΅ΡΠΈΠ°Π»ΡΠ½ΠΎΠ³ΠΎ Π½Π°Π±ΠΎΡΠ° CBA (BD Biosciences, USA). Π‘ΡΠ΅Π΄ΠΈ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ² ΡΡΠ°Π²Π½ΠΈΠ²Π°Π΅ΠΌΡΡ
Π³ΡΡΠΏΠΏ ΠΎΠ΄ΠΈΠ½Π°ΠΊΠΎΠ²ΠΎΠ³ΠΎ Π²ΠΎΠ·ΡΠ°ΡΡΠ° Π²ΡΡΠ²Π»Π΅Π½Ρ Π΄ΠΎΡΡΠΎΠ²Π΅ΡΠ½ΡΠ΅ ΡΠ°Π·Π»ΠΈΡΠΈΡ ΠΏΠΎ Π±ΠΎΠ»ΡΡΠΈΠ½ΡΡΠ²Ρ ΡΠΈΡΠΎΠΊΠΈΠ½ΠΎΠ², Π° ΠΈΠΌΠ΅Π½Π½ΠΎ ΠΏΡΠ΅ΠΈΠΌΡΡΠ΅ΡΡΠ²Π΅Π½Π½ΠΎΠ΅ ΠΏΠΎΠ²ΡΡΠ΅Π½ΠΈΠ΅ Ρ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ² Ρ ΡΠΎΡΠ΅ΡΠ°Π½Π½ΠΎΠΉ ΠΊΠ°ΡΠ΄ΠΈΠΎ- ΠΈ ΠΎΡΡΠ°Π»ΡΠΌΠΎΠΏΠ°ΡΠΎΠ»ΠΎΠ³ΠΈΠ΅ΠΉ ΠΎΡΠ½ΠΎΡΠΈΡΠ΅Π»ΡΠ½ΠΎ Π³ΡΡΠΏΠΏΡ Ρ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΡΡ ΡΠ΅ΡΠ΄ΡΠ°. ΠΠΎΠ²ΡΡΠΈΠ»ΠΎΡΡ Π² ΠΏΠ»Π°Π·ΠΌΠ΅ ΠΊΡΠΎΠ²ΠΈ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ² Ρ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΡΡ ΡΠ΅ΡΠ΄ΡΠ°, ΡΠΎΡΠ΅ΡΠ°Π½Π½ΠΎΠΉ Ρ Π³Π»Π°ΡΠΊΠΎΠΌΠΎΠΉ, ΡΠΎΠ΄Π΅ΡΠΆΠ°Π½ΠΈΠ΅ IL-5, IL-12, IFN-Ξ³, TNF-Ξ± c Π΄ΠΎΡΡΠΎΠ²Π΅ΡΠ½ΡΠΌ ΡΠ°Π·Π»ΠΈΡΠΈΠ΅ΠΌ ΠΏΠΎ ΡΡΠ°Π²Π½Π΅Π½ΠΈΡ Ρ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠ°ΠΌΠΈ Ρ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΡΡ ΡΠ΅ΡΠ΄ΡΠ°. ΠΠ΄Π½Π°ΠΊΠΎ Π½Π°ΠΈΠ²ΡΡΡΠ΅Π΅ ΡΠ²Π΅Π»ΠΈΡΠ΅Π½ΠΈΠ΅ ΡΡΠ΅Π΄ΠΈ ΡΠ°ΡΡΠΌΠ°ΡΡΠΈΠ²Π°Π΅ΠΌΡΡ
ΡΠΈΡΠΎΠΊΠΈΠ½ΠΎΠ² Ρ
Π°ΡΠ°ΠΊΡΠ΅ΡΠ½ΠΎ Π΄Π»Ρ IL-6 ΠΈ IL-17, ΡΠΎΡΡΠ°Π²ΠΈΠ²ΡΠ΅Π΅ Ρ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ² Ρ ΡΠΎΡΠ΅ΡΠ°Π½Π½ΠΎΠΉ ΠΊΠ°ΡΠ΄ΠΈΠΎ- ΠΈ ΠΎΡΡΠ°Π»ΡΠΌΠΎΠΏΠ°ΡΠΎΠ»ΠΎΠ³ΠΈΠ΅ΠΉ 23,8Β±1,1 ΠΏΠ³/ΠΌΠ» ΠΈ 20,2Β±1,7 ΠΏΠ³/ΠΌΠ» ΠΏΡΠΎΡΠΈΠ² 6,3Β±0,3 ΠΏΠ³/ΠΌΠ» ΠΈ 7,9Β±0,5 ΠΏΠ³/ΠΌΠ» ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²Π΅Π½Π½ΠΎ Ρ ΠΏΠ°ΡΠΈΠ΅Π½ΡΠΎΠ² Ρ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΡΡ ΡΠ΅ΡΠ΄ΡΠ°. ΠΠΌΠ΅ΡΡΠ΅ Ρ ΡΠ΅ΠΌ ΡΡΡΠ΅ΡΡΠ²Π΅Π½Π½ΠΎ ΡΠ½ΠΈΠ·ΠΈΠ»ΡΡ ΡΡΠΎΠ²Π΅Π½Ρ IL-4 ΠΈ IL-10 Π΄ΠΎ 2,2Β±0,2 ΠΏΠ³/ΠΌΠ» ΠΈ 6,4Β±0,4 ΠΏΠ³/ΠΌΠ» ΠΏΡΠΎΡΠΈΠ² 4,8Β±0,3 ΠΏΠ³/ΠΌΠ» ΠΈ 11,9Β±0,6 ΠΏΠ³/ΠΌΠ». ΠΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ Π»ΠΎΠ³ΠΈΡΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΡΠ΅Π³ΡΠ΅ΡΡΠΈΠΈ ΠΏΠΎΠ·Π²ΠΎΠ»ΠΈΠ»ΠΎ ΠΎΠΏΡΠ΅Π΄Π΅Π»ΠΈΡΡ Π²Π΅Π»ΠΈΡΠΈΠ½Ρ ΠΎΡΠ½ΠΎΡΠΈΡΠ΅Π»ΡΠ½ΠΎΠ³ΠΎ ΡΠΈΡΠΊΠ° ΠΈΠ·ΡΡΠ΅Π½Π½ΡΡ
ΡΠΈΡΠΎΠΊΠΈΠ½ΠΎΠ² ΠΊΡΠΎΠ²ΠΈ ΠΈ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠ°ΡΡ Π½Π΅ΡΠΊΠΎΡΡΠ΅ΠΊΡΠΈΡΠΎΠ²Π°Π½Π½ΡΠ΅ ΠΈ ΡΠΊΠΎΡΡΠ΅ΠΊΡΠΈΡΠΎΠ²Π°Π½Π½ΡΠ΅ ΠΌΠΎΠ΄Π΅Π»ΠΈ, ΡΠΎΠ³Π»Π°ΡΠ½ΠΎ ΠΊΠΎΡΠΎΡΡΠΌ Π½Π°ΠΈΠ±ΠΎΠ»Π΅Π΅ ΡΠ΅ΡΠ½Π°Ρ Π°ΡΡΠΎΡΠΈΠ°ΡΠΈΡ Ρ ΡΠΈΡΠΊΠΎΠΌ ΡΠ°Π·Π²ΠΈΡΠΈΡ ΡΠΎΡΠ΅ΡΠ°Π½Π½ΠΎΠΉ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΠΈ ΡΠ΅ΡΠ΄ΡΠ° Ρ Π³Π»Π°ΡΠΊΠΎΠΌΠΎΠΉ ΡΡΡΠ°Π½ΠΎΠ²Π»Π΅Π½Π° Π΄Π»Ρ IL-6 ΠΈ IL-17, Ρ Π²Π΅Π»ΠΈΡΠΈΠ½Π°ΠΌΠΈ ΠΎΡΠ½ΠΎΡΠΈΡΠ΅Π»ΡΠ½ΠΎΠ³ΠΎ ΡΠΈΡΠΊΠ° Π² Π½Π΅ΡΠΊΠΎΡΡΠ΅ΠΊΡΠΈΡΠΎΠ²Π°Π½Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ 2,87 ΠΈ 2,71 ΡΠΎΠΎΡΠ²Π΅ΡΡΡΠ²Π΅Π½Π½ΠΎ (p<0,001). ΠΠ΄Π½Π°ΠΊΠΎ Π² ΡΠΊΠΎΡΡΠ΅ΠΊΡΠΈΡΠΎΠ²Π°Π½Π½ΠΎΠΉ ΠΌΠΎΠ΄Π΅Π»ΠΈ Π°ΡΡΠΎΡΠΈΠ°ΡΠΈΡ IL-6 Ρ ΡΠΎΡΠ΅ΡΠ°Π½Π½ΠΎΠΉ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΡΡ ΡΠ΅ΡΠ΄ΡΠ° Ρ Π³Π»Π°ΡΠΊΠΎΠΌΠΎΠΉ ΠΏΠΎΠ²ΡΡΠΈΠ»Π°ΡΡ Π΄ΠΎ 2,92 (ΠΠ 2,80-3,27, Ρ=0,004), Π° IL-17 ΡΠΌΠ΅Π½ΡΡΠΈΠ»ΠΎΡΡ Π΄ΠΎ 2,64 (ΠΠ 2,51-2,85, Ρ=0,003). Π£ΡΡΠ°Π½ΠΎΠ²Π»Π΅Π½Π° ΡΠ°ΠΊΠΆΠ΅ Π΄ΠΎΡΡΠΎΠ²Π΅ΡΠ½Π°Ρ Π°ΡΡΠΎΡΠΈΠ°ΡΠΈΡ IL-4, IL-5, IL-12, IFN-Ξ³ ΠΈ TNF-Ξ± Ρ ΡΠΎΡΠ΅ΡΠ°Π½Π½ΠΎΠΉ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΡΡ ΡΠ΅ΡΠ΄ΡΠ° Ρ Π³Π»Π°ΡΠΊΠΎΠΌΠΎΠΉ. ΠΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠ΅ ΠΏΡΠΎΠ΄Π΅ΠΌΠΎΠ½ΡΡΡΠΈΡΠΎΠ²Π°Π»ΠΎ Π½ΠΎΠ²ΡΠ΅ Π°ΡΡΠΎΡΠΈΠ°ΡΠΈΠΈ ΡΠΈΡΡΠ΅ΠΌΠ½ΡΡ
ΡΠΈΡΠΎΠΊΠΈΠ½ΠΎΠ² Ρ ΡΠΈΡΠΊΠΎΠΌ ΡΠ°Π·Π²ΠΈΡΠΈΡ ΡΠΎΡΠ΅ΡΠ°Π½Π½ΠΎΠΉ ΠΈΡΠ΅ΠΌΠΈΡΠ΅ΡΠΊΠΎΠΉ Π±ΠΎΠ»Π΅Π·Π½ΡΡ ΡΠ΅ΡΠ΄ΡΠ° Ρ Π³Π»Π°ΡΠΊΠΎΠΌΠΎΠΉ
Scalable belief propagation via relaxed scheduling
The ability to leverage large-scale hardware parallelism has been one of the key enablers of the accelerated recent progress in machine learning. Consequently, there has been considerable effort invested into developing efficient parallel variants of classic machine learning algorithms. However, despite the wealth of knowledge on parallelization, some classic machine learning algorithms often prove hard to parallelize efficiently while maintaining convergence. In this paper, we focus on efficient parallel algorithms for the key machine learning task of inference on graphical models, in particular on the fundamental belief propagation algorithm. We address the challenge of efficiently parallelizing this classic paradigm by showing how to leverage scalable relaxed schedulers in this context. We present an extensive empirical study, showing that our approach outperforms previous parallel belief propagation implementations both in terms of scalability and in terms of wall-clock convergence time, on a range of practical applications
Provably and Practically Efficient Granularity Control
International audienceOver the past decade, many programming languages and systems for parallel-computing have been developed, e.g., Fork/Join and Habanero Java, Parallel Haskell, Parallel ML, and X10. Although these systems raise the level of abstraction for writing parallel codes, performance continues to require labor-intensive optimizations for coarsening the granularity of parallel executions. In this paper, we present provably and practically efficient techniques for controlling granularity within the run-time system of the language. Our starting point is "oracle-guided scheduling", a result from the functional-programming community that shows that granularity can be controlled by an "oracle" that can predict the execution time of parallel codes. We give an algorithm for implementing such an oracle and prove that it has the desired theoretical properties under the nested-parallel programming model. We implement the oracle in C++ by extending Cilk and evaluate its practical performance. The results show that our techniques can essentially eliminate hand tuning while closely matching the performance of hand tuned codes